Skip to content

[Entity Analytics] Update host.ip aggregation to remove painless script#252426

Merged
ymao1 merged 2 commits intoelastic:mainfrom
ymao1:ea/host-ip-agg
Feb 10, 2026
Merged

[Entity Analytics] Update host.ip aggregation to remove painless script#252426
ymao1 merged 2 commits intoelastic:mainfrom
ymao1:ea/host-ip-agg

Conversation

@ymao1
Copy link
Copy Markdown
Contributor

@ymao1 ymao1 commented Feb 9, 2026

Summary

The original reason for introducing this painless script into the host.ip aggregation was because the normal aggregation would fail when aggregating over data where the host.ip field had a mixed mapping (mapped as keyword in one index and ip in another). With the introduction of the value_type specification in Elasticsearch, we can now choose which value type to use when there are conflicts. This PR removes the inefficient painless script in the host.ip aggregation for the standard terms agg with a value_type specification.

To Verify

Verify that the host and user flyouts show aggregated IP information

  1. Start ES and Kibana and load some entity data that includes host.ip info
  2. Open the host and user flyouts from the Explore and verify that IP information is populated in the observed details

To recreate the original problem:

  1. Start ES and Kibana and go to the Dev Console
  2. Create 2 indices with host.ip and timestamp fields. Notice one index has host.ip mapped as a keyword and one has host.ip mapped as ip. Index some documents
Dev Console Commands
PUT hosts-keyword
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-keyword/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}

PUT hosts-ip
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "ip"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-ip/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}
  1. Try a normal terms aggregation against these two indices. You should see the aggregation fail and an error in the Elasticsearch logs:
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs": {
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}
java.lang.IllegalArgumentException: Failed trying to format bytes as IP address.  Possibly caused by a mapping mismatch
  1. Add value_type to the terms aggregation. You should see the aggregation response with a shard failure indicating an illegal argument exception, but the aggregation should be performed over the correctly ip mapped data
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs":{
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "value_type": "ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}

@ymao1 ymao1 changed the title Changing host.ip aggregation [Entity Analytics] Update host.ip aggregation to remove painless script Feb 9, 2026
@ymao1 ymao1 self-assigned this Feb 9, 2026
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:Entity Analytics Security Entity Analytics Team v9.4.0 labels Feb 9, 2026
@ymao1 ymao1 marked this pull request as ready for review February 10, 2026 00:26
@ymao1 ymao1 requested a review from a team as a code owner February 10, 2026 00:26
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/security-entity-analytics (Team:Entity Analytics)

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

cc @ymao1

@abhishekbhatia1710
Copy link
Copy Markdown
Contributor

abhishekbhatia1710 commented Feb 10, 2026

Thanks @ymao1 .

I tested this locally with two indices (hosts-ip with host.ip mapped as ip, and hosts-keyword with host.ip mapped as keyword) and verified the following:

The value_type: 'ip' approach causes a shard failure on keyword-mapped indices:

"failures": [{
   "shard": 0,
   "index": "hosts-keyword",
   "reason": {
     "type": "illegal_argument_exception",
     "reason": "Field type [keyword] is incompatible with specified value_type [ip]"
   }
 }]

This means IPs that only exist in keyword-mapped indices are silently dropped from the host/user details flyout without any warning or error shown to the user. Below screenshot shows that there are 4 ip addresses in the hosts table but only 2 in the flyout which are aggregated using the value_type:ip field

Screenshot 2026-02-10 at 1 12 11 PM

I understand the performance benefit of removing the painless script. However, could you clarify:

  1. How common is it for real-world environments to have host.ip mapped as keyword in some indices? If this is rare/misconfigured, the silent data loss is likely acceptable.

  2. Should we consider surfacing the shard failure to the user (e.g., a warning indicator in the flyout) so they at least know partial results are being shown? Maybe not part of this PR, but a follow-up PR?

CC : @jaredburgettelastic

@ymao1
Copy link
Copy Markdown
Contributor Author

ymao1 commented Feb 10, 2026

@abhishekbhatia1710 Thanks for desk testing!

The original painless script was introduced to handle IP data that was incorrectly keyword mapped due to a failure on the ingest side. IP data that is properly ingested should always be IP mapped, so this case of having incompatible keyword and IP mapping for host.ip should be an edge case. In this edge case however, ES does not handle the aggregation gracefully and fails catastrophically which is why the painless script was introduced. I agree that it's not ideal that we're excluding the keyword mapped IPs from the aggregation but believe this shouldn't be the typical case.

@abhishekbhatia1710
Copy link
Copy Markdown
Contributor

The original painless script was introduced to handle IP data that was incorrectly keyword mapped due to a failure on the ingest side. IP data that is properly ingested should always be IP mapped, so this case of having incompatible keyword and IP mapping for host.ip should be an edge case. In this edge case however, ES does not handle the aggregation gracefully and fails catastrophically which is why the painless script was introduced. I agree that it's not ideal that we're excluding the keyword mapped IPs from the aggregation but believe this shouldn't be the typical case.

Thanks for the context, that makes sense. Given this is an ingest-side edge case, i think we can consider this as a tradeoff. Approving!

Copy link
Copy Markdown
Contributor

@abhishekbhatia1710 abhishekbhatia1710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, code and desk tested!

@ymao1 ymao1 merged commit 586fcba into elastic:main Feb 10, 2026
16 checks passed
@ymao1 ymao1 deleted the ea/host-ip-agg branch February 10, 2026 14:54
mbondyra added a commit to mbondyra/kibana that referenced this pull request Feb 10, 2026
* commit '7dcc1fe3c205d2de0c3ca3f65804f21de09013c3': (285 commits)
  Enrich kbn-check-saved-objects-cli README with CI and manual usage docs (elastic#252557)
  [Discover] Add feature flag to make ESQL the default query mode (elastic#252268)
  Add maskProps.headerZindexLocation above to inspect component flyout (elastic#252543)
  [Security Solution][Atack/Alerts] Flyout header: Assignees  (elastic#252190)
  Upgrade EUI to v112.3.0 (elastic#252315)
  [Fleet] Make save_knowledge_base async in streaming state machine (elastic#252328)
  Upgrade @smithy/config-resolver 4.3.0 → 4.4.6 (elastic#252457)
  [Lens as API] Add colorMapping support for XY charts (ES|QL data layers) (elastic#252051)
  [WorkplaceAI] Add Google Drive data source and connector (elastic#250677)
  [Scout] Move GlobalSearch FTR tests to Scout (elastic#252201)
  [EDR Workflows] Fix osquery pack results display when agent clock is skewed (elastic#251417)
  [Observability Onboarding] Apply integrations limit after dedup in parseIntegrationsTSV (elastic#252486)
  [Entity Analytics] Update `host.ip` aggregation to remove painless script (elastic#252426)
  Address `@elastic/eui/require-table-caption` lint violations across `@elastic/obs-presentation-team` files (elastic#251050)
  Consolidate JSON stringify dependencies (elastic#251890)
  [index mgmt] Use esql instead of query dsl to get the index count (elastic#252422)
  Add Usage API Plugin (elastic#252434)
  Cases All Templates page (elastic#250372)
  [Agent Builder] Default value for optional params in ESQL tools (elastic#238472)
  [Fleet] Add upgrade_details.metadata.reason to AgentResponseSchema (elastic#252485)
  ...
@ymao1 ymao1 added backport:version Backport to applied version labels v9.3.2 and removed backport:skip This PR does not require backporting labels Feb 23, 2026
@kibanamachine
Copy link
Copy Markdown
Contributor

Starting backport for target branches: 9.3

https://github.com/elastic/kibana/actions/runs/22315605462

@kibanamachine
Copy link
Copy Markdown
Contributor

Starting backport for target branches: 9.3

https://github.com/elastic/kibana/actions/runs/22315605302

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 23, 2026
…ript (elastic#252426)

## Summary

The original reason for introducing this painless script into the
`host.ip` aggregation was because the normal aggregation would fail when
aggregating over data where the `host.ip` field had a mixed mapping
(mapped as `keyword` in one index and `ip` in another). With the
introduction of the `value_type` specification in Elasticsearch, we can
now choose which value type to use when there are conflicts. This PR
removes the inefficient painless script in the `host.ip` aggregation for
the standard terms agg with a `value_type` specification.

## To Verify

**Verify that the host and user flyouts show aggregated IP information**
1. Start ES and Kibana and load some entity data that includes `host.ip`
info
2. Open the host and user flyouts from the Explore and verify that IP
information is populated in the observed details

**To recreate the original problem:**

1. Start ES and Kibana and go to the Dev Console
2. Create 2 indices with `host.ip` and `timestamp` fields. Notice one
index has `host.ip` mapped as a `keyword` and one has `host.ip` mapped
as `ip`. Index some documents

<details>
<summary> Dev Console Commands </summary>

```
PUT hosts-keyword
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-keyword/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}

PUT hosts-ip
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "ip"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-ip/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}
```
</details>

3. Try a normal terms aggregation against these two indices. You should
see the aggregation fail and an error in the Elasticsearch logs:

```
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs": {
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}
```

```
java.lang.IllegalArgumentException: Failed trying to format bytes as IP address.  Possibly caused by a mapping mismatch
```

4. Add `value_type` to the terms aggregation. You should see the
aggregation response with a shard failure indicating an illegal argument
exception, but the aggregation should be performed over the correctly
`ip` mapped data

```
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs":{
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "value_type": "ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}
```

(cherry picked from commit 586fcba)
@kibanamachine
Copy link
Copy Markdown
Contributor

💚 All backports created successfully

Status Branch Result
9.3

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Feb 23, 2026
…ript (elastic#252426)

## Summary

The original reason for introducing this painless script into the
`host.ip` aggregation was because the normal aggregation would fail when
aggregating over data where the `host.ip` field had a mixed mapping
(mapped as `keyword` in one index and `ip` in another). With the
introduction of the `value_type` specification in Elasticsearch, we can
now choose which value type to use when there are conflicts. This PR
removes the inefficient painless script in the `host.ip` aggregation for
the standard terms agg with a `value_type` specification.

## To Verify

**Verify that the host and user flyouts show aggregated IP information**
1. Start ES and Kibana and load some entity data that includes `host.ip`
info
2. Open the host and user flyouts from the Explore and verify that IP
information is populated in the observed details

**To recreate the original problem:**

1. Start ES and Kibana and go to the Dev Console
2. Create 2 indices with `host.ip` and `timestamp` fields. Notice one
index has `host.ip` mapped as a `keyword` and one has `host.ip` mapped
as `ip`. Index some documents

<details>
<summary> Dev Console Commands </summary>

```
PUT hosts-keyword
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-keyword/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}

PUT hosts-ip
{
  "mappings": {
    "properties": {
      "host.ip": {
        "type": "ip"
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

POST hosts-ip/_bulk
{"index":{}}
{"host.ip":"192.168.1.1","timestamp":"2025-02-09T10:00:00Z"}
{"index":{}}
{"host.ip":"10.0.0.5","timestamp":"2025-02-09T10:01:00Z"}
{"index":{}}
{"host.ip":"172.16.0.100","timestamp":"2025-02-09T10:02:00Z"}
{"index":{}}
{"host.ip":"192.168.2.50","timestamp":"2025-02-09T10:03:00Z"}
{"index":{}}
{"host.ip":"10.0.1.20","timestamp":"2025-02-09T10:04:00Z"}
{"index":{}}
{"host.ip":"203.0.113.42","timestamp":"2025-02-09T10:05:00Z"}
{"index":{}}
{"host.ip":"198.51.100.10","timestamp":"2025-02-09T10:06:00Z"}
{"index":{}}
{"host.ip":"192.168.0.1","timestamp":"2025-02-09T10:07:00Z"}
{"index":{}}
{"host.ip":"10.10.10.10","timestamp":"2025-02-09T10:08:00Z"}
{"index":{}}
{"host.ip":"172.31.255.1","timestamp":"2025-02-09T10:09:00Z"}
```
</details>

3. Try a normal terms aggregation against these two indices. You should
see the aggregation fail and an error in the Elasticsearch logs:

```
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs": {
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}
```

```
java.lang.IllegalArgumentException: Failed trying to format bytes as IP address.  Possibly caused by a mapping mismatch
```

4. Add `value_type` to the terms aggregation. You should see the
aggregation response with a shard failure indicating an illegal argument
exception, but the aggregation should be performed over the correctly
`ip` mapped data

```
GET hosts-ip,hosts-keyword/_search
{
  "size": 0,
  "aggs":{
    "host_ip": {
      "terms": {
        "field":"host.ip",
        "value_type": "ip",
        "size": 10,
        "order": {
          "timestamp": "desc"
        }
      },
      "aggs": {
        "timestamp": {
          "max": {
            "field": "timestamp"
          }
        }
      }
    }
  }
}
```

(cherry picked from commit 586fcba)
@kibanamachine
Copy link
Copy Markdown
Contributor

💚 All backports created successfully

Status Branch Result
9.3

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Feb 23, 2026
…ess script (#252426) (#254549)

# Backport

This will backport the following commits from `main` to `9.3`:
- [[Entity Analytics] Update `host.ip` aggregation to remove painless
script (#252426)](#252426)

<!--- Backport version: 9.6.6 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sorenlouv/backport)

<!--BACKPORT [{"author":{"name":"Ying
Mao","email":"ying.mao@elastic.co"},"sourceCommit":{"committedDate":"2026-02-10T14:54:13Z","message":"[Entity
Analytics] Update `host.ip` aggregation to remove painless script
(#252426)\n\n## Summary\n\nThe original reason for introducing this
painless script into the\n`host.ip` aggregation was because the normal
aggregation would fail when\naggregating over data where the `host.ip`
field had a mixed mapping\n(mapped as `keyword` in one index and `ip` in
another). With the\nintroduction of the `value_type` specification in
Elasticsearch, we can\nnow choose which value type to use when there are
conflicts. This PR\nremoves the inefficient painless script in the
`host.ip` aggregation for\nthe standard terms agg with a `value_type`
specification.\n\n## To Verify\n\n**Verify that the host and user
flyouts show aggregated IP information**\n1. Start ES and Kibana and
load some entity data that includes `host.ip`\ninfo\n2. Open the host
and user flyouts from the Explore and verify that IP\ninformation is
populated in the observed details\n\n**To recreate the original
problem:**\n\n1. Start ES and Kibana and go to the Dev Console\n2.
Create 2 indices with `host.ip` and `timestamp` fields. Notice
one\nindex has `host.ip` mapped as a `keyword` and one has `host.ip`
mapped\nas `ip`. Index some documents\n\n<details>\n<summary> Dev
Console Commands </summary>\n\n```\nPUT hosts-keyword\n{\n \"mappings\":
{\n \"properties\": {\n \"host.ip\": {\n \"type\": \"keyword\"\n },\n
\"timestamp\": {\n \"type\": \"date\"\n }\n }\n }\n}\n\nPOST
hosts-keyword/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n\nPUT
hosts-ip\n{\n \"mappings\": {\n \"properties\": {\n \"host.ip\": {\n
\"type\": \"ip\"\n },\n \"timestamp\": {\n \"type\": \"date\"\n }\n }\n
}\n}\n\nPOST
hosts-ip/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n```\n</details>\n\n3.
Try a normal terms aggregation against these two indices. You
should\nsee the aggregation fail and an error in the Elasticsearch
logs:\n\n```\nGET hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n
\"aggs\": {\n \"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n
\"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n },\n
\"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\": \"timestamp\"\n
}\n }\n }\n }\n }\n}\n```\n\n```\njava.lang.IllegalArgumentException:
Failed trying to format bytes as IP address. Possibly caused by a
mapping mismatch\n```\n\n4. Add `value_type` to the terms aggregation.
You should see the\naggregation response with a shard failure indicating
an illegal argument\nexception, but the aggregation should be performed
over the correctly\n`ip` mapped data\n\n```\nGET
hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n \"aggs\":{\n
\"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n \"value_type\":
\"ip\",\n \"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n
},\n \"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\":
\"timestamp\"\n }\n }\n }\n }\n
}\n}\n```","sha":"586fcba7f7d54a0bdf45972364b90b9d572045e8","branchLabelMapping":{"^v9.4.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Entity
Analytics","backport:version","v9.4.0","v9.3.2"],"title":"[Entity
Analytics] Update `host.ip` aggregation to remove painless
script","number":252426,"url":"https://github.com/elastic/kibana/pull/252426","mergeCommit":{"message":"[Entity
Analytics] Update `host.ip` aggregation to remove painless script
(#252426)\n\n## Summary\n\nThe original reason for introducing this
painless script into the\n`host.ip` aggregation was because the normal
aggregation would fail when\naggregating over data where the `host.ip`
field had a mixed mapping\n(mapped as `keyword` in one index and `ip` in
another). With the\nintroduction of the `value_type` specification in
Elasticsearch, we can\nnow choose which value type to use when there are
conflicts. This PR\nremoves the inefficient painless script in the
`host.ip` aggregation for\nthe standard terms agg with a `value_type`
specification.\n\n## To Verify\n\n**Verify that the host and user
flyouts show aggregated IP information**\n1. Start ES and Kibana and
load some entity data that includes `host.ip`\ninfo\n2. Open the host
and user flyouts from the Explore and verify that IP\ninformation is
populated in the observed details\n\n**To recreate the original
problem:**\n\n1. Start ES and Kibana and go to the Dev Console\n2.
Create 2 indices with `host.ip` and `timestamp` fields. Notice
one\nindex has `host.ip` mapped as a `keyword` and one has `host.ip`
mapped\nas `ip`. Index some documents\n\n<details>\n<summary> Dev
Console Commands </summary>\n\n```\nPUT hosts-keyword\n{\n \"mappings\":
{\n \"properties\": {\n \"host.ip\": {\n \"type\": \"keyword\"\n },\n
\"timestamp\": {\n \"type\": \"date\"\n }\n }\n }\n}\n\nPOST
hosts-keyword/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n\nPUT
hosts-ip\n{\n \"mappings\": {\n \"properties\": {\n \"host.ip\": {\n
\"type\": \"ip\"\n },\n \"timestamp\": {\n \"type\": \"date\"\n }\n }\n
}\n}\n\nPOST
hosts-ip/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n```\n</details>\n\n3.
Try a normal terms aggregation against these two indices. You
should\nsee the aggregation fail and an error in the Elasticsearch
logs:\n\n```\nGET hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n
\"aggs\": {\n \"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n
\"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n },\n
\"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\": \"timestamp\"\n
}\n }\n }\n }\n }\n}\n```\n\n```\njava.lang.IllegalArgumentException:
Failed trying to format bytes as IP address. Possibly caused by a
mapping mismatch\n```\n\n4. Add `value_type` to the terms aggregation.
You should see the\naggregation response with a shard failure indicating
an illegal argument\nexception, but the aggregation should be performed
over the correctly\n`ip` mapped data\n\n```\nGET
hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n \"aggs\":{\n
\"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n \"value_type\":
\"ip\",\n \"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n
},\n \"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\":
\"timestamp\"\n }\n }\n }\n }\n
}\n}\n```","sha":"586fcba7f7d54a0bdf45972364b90b9d572045e8"}},"sourceBranch":"main","suggestedTargetBranches":["9.3"],"targetPullRequestStates":[{"branch":"main","label":"v9.4.0","branchLabelMappingKey":"^v9.4.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/252426","number":252426,"mergeCommit":{"message":"[Entity
Analytics] Update `host.ip` aggregation to remove painless script
(#252426)\n\n## Summary\n\nThe original reason for introducing this
painless script into the\n`host.ip` aggregation was because the normal
aggregation would fail when\naggregating over data where the `host.ip`
field had a mixed mapping\n(mapped as `keyword` in one index and `ip` in
another). With the\nintroduction of the `value_type` specification in
Elasticsearch, we can\nnow choose which value type to use when there are
conflicts. This PR\nremoves the inefficient painless script in the
`host.ip` aggregation for\nthe standard terms agg with a `value_type`
specification.\n\n## To Verify\n\n**Verify that the host and user
flyouts show aggregated IP information**\n1. Start ES and Kibana and
load some entity data that includes `host.ip`\ninfo\n2. Open the host
and user flyouts from the Explore and verify that IP\ninformation is
populated in the observed details\n\n**To recreate the original
problem:**\n\n1. Start ES and Kibana and go to the Dev Console\n2.
Create 2 indices with `host.ip` and `timestamp` fields. Notice
one\nindex has `host.ip` mapped as a `keyword` and one has `host.ip`
mapped\nas `ip`. Index some documents\n\n<details>\n<summary> Dev
Console Commands </summary>\n\n```\nPUT hosts-keyword\n{\n \"mappings\":
{\n \"properties\": {\n \"host.ip\": {\n \"type\": \"keyword\"\n },\n
\"timestamp\": {\n \"type\": \"date\"\n }\n }\n }\n}\n\nPOST
hosts-keyword/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n\nPUT
hosts-ip\n{\n \"mappings\": {\n \"properties\": {\n \"host.ip\": {\n
\"type\": \"ip\"\n },\n \"timestamp\": {\n \"type\": \"date\"\n }\n }\n
}\n}\n\nPOST
hosts-ip/_bulk\n{\"index\":{}}\n{\"host.ip\":\"192.168.1.1\",\"timestamp\":\"2025-02-09T10:00:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.0.5\",\"timestamp\":\"2025-02-09T10:01:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.16.0.100\",\"timestamp\":\"2025-02-09T10:02:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.2.50\",\"timestamp\":\"2025-02-09T10:03:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.0.1.20\",\"timestamp\":\"2025-02-09T10:04:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"203.0.113.42\",\"timestamp\":\"2025-02-09T10:05:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"198.51.100.10\",\"timestamp\":\"2025-02-09T10:06:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"192.168.0.1\",\"timestamp\":\"2025-02-09T10:07:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"10.10.10.10\",\"timestamp\":\"2025-02-09T10:08:00Z\"}\n{\"index\":{}}\n{\"host.ip\":\"172.31.255.1\",\"timestamp\":\"2025-02-09T10:09:00Z\"}\n```\n</details>\n\n3.
Try a normal terms aggregation against these two indices. You
should\nsee the aggregation fail and an error in the Elasticsearch
logs:\n\n```\nGET hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n
\"aggs\": {\n \"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n
\"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n },\n
\"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\": \"timestamp\"\n
}\n }\n }\n }\n }\n}\n```\n\n```\njava.lang.IllegalArgumentException:
Failed trying to format bytes as IP address. Possibly caused by a
mapping mismatch\n```\n\n4. Add `value_type` to the terms aggregation.
You should see the\naggregation response with a shard failure indicating
an illegal argument\nexception, but the aggregation should be performed
over the correctly\n`ip` mapped data\n\n```\nGET
hosts-ip,hosts-keyword/_search\n{\n \"size\": 0,\n \"aggs\":{\n
\"host_ip\": {\n \"terms\": {\n \"field\":\"host.ip\",\n \"value_type\":
\"ip\",\n \"size\": 10,\n \"order\": {\n \"timestamp\": \"desc\"\n }\n
},\n \"aggs\": {\n \"timestamp\": {\n \"max\": {\n \"field\":
\"timestamp\"\n }\n }\n }\n }\n
}\n}\n```","sha":"586fcba7f7d54a0bdf45972364b90b9d572045e8"}},{"branch":"9.3","label":"v9.3.2","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Ying Mao <ying.mao@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Entity Analytics Security Entity Analytics Team v9.3.1 v9.3.2 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants